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ABSTRACT 

Many  machine  languages  have  different  instruction  formats  which 
allow  addressing  of  "nearby"  operands  with  a "short"  instruction 
(e . g . one  word),  while  "faraway"  operands  require  a "long"  format 
(e . g . two  words).  Because  the  size  of  object  code  may  depend  upon 
the  formats  used,  the  formats  of  different  instructions  may  be 
dependent  on  each  other. 

An  efficient  algorithm  is  given  for  optimally  assigning  formats 
to  instructions  in  a given  program,  and  an  implementation  will  be 
discussed  which  is  practical  in  space  as  well  as  time.  The  more 
sophisticated  problem  of  arranging  operands  within  programs  is 
discussed.  Unfortunately,  it  is  unlikely  that  an  efficient 
algorithm  can  even  guarantee  good  approximations  for  this  problem. 
Finally,  implications  of  this  problem  on  hardware  and  software 
designs  are  considered. 
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I . Int  roduct ion 

One  aspect  of  machine  organization  which  occurs  quite 
frequently  in  mini-  and  mic ro -computers  is  a variety  of  address- 
ing modes  which  require  different  instruction  lengths  in  one 
machines.  The  most  straightforward  example  is  perhaps  in  IBM 
1130,  in  which  instructions  may  be  either  "short  - format" , ±127 
words  (actually  +127  words,  -128  words  due  to  two ' s -complement 
representation  - we  will  occasionally  ignore  this  detail  in 
the  future)  from  the  current  value  of  the  program  counter,  or 
"long  - format" , with  an  entire  16  bit  word  containing  a direct 
memory  address.  Thus  a reference  or  branch  to  a "nearby"  loca- 
tion requires  a one-word  instruction,  while  a reference  "far- 
away" requires  two  words.  If  we  consider  housekeeping  operations 
necessary  to  establish  addressability  (of  information  already 
present  in  memory,  as  opposed  to,  say,  paging  overhead),  then 
even  such  a large  machine  as  an  IBM/370  exhibits  these  differences 
in  addressing  cost.  Table  I compares  the  addressing  mode  varia- 
tions of  a number  of  machines. 
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TABLE  I: 

Addressing  mode  variations 


machine 

"nearby" 

"far-away" 

Motorola  6800 

bytes  0 thru  255 
("direct")  , PC+127 
("relative")  , or 
index  register  + 8- 
bit  displacement 

16  bit  byte  address 

PDP-8 

128  word  "pages", 
can  address  page  0 
or  current  (PC) 
page 

Indirect  addressing  of 
4096  word  "fields," 

3 bit  field  pointers 
("very  faraway") 

Microdata  1600/30 

Can  address  page  0 
(256  words)  or 

PC+127 

15  bit  direct  address, 
can  generate  16  bit 
with  use  of  index 
register 

IBM  1130 

PC±127  or  index 
register  +127 

16  bit  address  (index- 
ing and  indirection) 

PDP-11 

PC±127  words 
( BRANCHES  ONLY ) 

16  bit  byte  address 
(indexing,  indirec- 
tions, etc.) 

PRIME  400 

Displacement  of  up 
to  255  words  from 
stack  or  PC  base 
registers 

16  bit  displacement 
from  procedure  base, 
also  various  indirec- 
tions (16,  32,  or  48- 
bit  pointers) 

INTF.RDATA  8/32 

14  bit  direct  and 

15  bit  PC-relative 
address  (indexing 
allowed) 

24  bit  address  (8/32 
uses  20  bits;  single 
and  double  indexing) 

IBM/360  and  370 

Up  to  15  (usually 
fewer)  base 
register  pointing 
to  4096-byte  blocks 

Must  load  a new  base 
register  value 

(PC  indicates  the  current  value  of  the  program  counter,  which 
points  to  the  location  of  the  instruction  being  executed.) 


When  generating  code  for  machines  with  different  address  modes,  it 
is  of  course  desirable  to  use  the  shorter  formats  whenever  possible.  This 
not  only  decreases  the  program  space  requirement,  but  may  also 
decrease  execution  time,  since  an  additional  fetch  cycle  is  usually 
necessary  if  a longer  addressing  mode  is  used.  This  concern  is  therefore 
a code  optimization  problem,  which  may  be  approached  either  in  a basic  way, 
using  the  minimal  muter  of  long  addresses  in  a fixed  program,  or  in  a 
more  sophisticated  way  with  program  rearrangement.  The  principle  activity 
in  the  later  case  would  be  moving  variables  "nearby"  locations  where  they 
are  referenced.  As  is  often  the  case,  the  simple  optimization  is  easy, 
while  the  sophisticated  one  is  hard. 

The  particular  model  we  adopt — of  course  the  simplest  one  available- - 
has  two  addressing  formats,  which  wo  will  call  "short”  and  "long”.  All 
instructions  are  single  address,  with  a short  instruction  (i.e.  an  instruc- 
tion whose  address  is  in  the  short  format)  occupies  one  word  (unit)  while 
a long  instruction  occupies  two  words . A long  address  may  refer  to  any 
location  in  memory,  while  a short  instruction  may  refer  only  up  to  a fixed 
distance  l forward  or  backward  from  the  instruction  containing  it.  Al- 
though this  is  a sinple  model,  it  corresponds  quite  closely  to  an  IBM  1130 
(as  described  above) . 

The  notation  for  this  model  will  consist  of  identifying  labels  wri-th 
data  words  and  vrith  instructions  and  their  operands,  so  that  a:  8 wall  represent 
an  instruction  labelled  a which  has  the  word  labelled  8 as  an  operand. 

Note  that  a: 8 is  an  affirmative  statement  about  the  instruction  labelled 
a.  Of  course,  each  instruction  has  a unique  label  but  a label  may  occur  in 
many  operands.  The  occurrences  of  labels  on  instructions  defines  an  order 
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(<)  on  these  labels  fwhich  is  a partial  to  the  extent  that  the  program 
consists  of  posit  ion- independent  modules).  The  distance  between  two  instruc- 
tions on  labels  is  the  number  of  words  between  the  instructions  fa  word  is 
distance  0 from  itself,  +1  irom  its  immediate  predecessor  or  successor). 

If  an  assembler,  compiler,  or  other  code -mod i fving  procedure  is  changing 
instruction  formats,  distance  is  then  dependent  upon  the  time  or  stage  of 
the  procedure,  since  the  number  of  words  between  two  instructions  may  depend 
upon  whether  intervening  instructions  are  long  or  short.  However,  we  will 
always  begin  presuming  that  all  instructions  are  short;  thus  the  initial 
distance  will  be  defined  to  take  this  into  account,  i.e.,  the  number  of 
instructions  (plus  data  words)  between  two  instructions.  If  a and  6 are 
labels,  then  di(a,8)  is  their  initial  distance  and  d(a,8)  is  the  distance 
at  some  (here  unspecified)  time.  An  instruction  a:8  is  in  the  span  of 
y r <5  if  y < a < 6 or  6 < a < y.  In  this  case  we  also  say  y : 6 is  dependent 
on  a: 8 - that  is,  d(y,6)  changes  if  the  format  of  a: 8 does. 

A simple  graphical  notation  is  often  more  convenient  to  represent 
structures  of  labels  and  instructions.  We  will  draw  memory  as  a series  of 
cells  across  a page,  with  address  labels  written  above  the  cells  and  operand 
addresses  indicated  by  arrows.  Thus,  figure  1 represents  the  following  facts: 
a:B,  6:y,  y < a < 8,  d(a,y)  = 1,  and  d(8,yj  = k.  Observe  that  the  words 

intervening  between  8 and  y are  not  indicated,  but  they  are  assumed  to 

\ 

have  fixed  size.  The  instruction  8:y  is  dependent  on  a:6. 
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Figure  1 


II.  In-place  Code  Optimization 

The  simplest  type  of  optimization  problem  on  short/long  address  machine 
is  to  choose  exactly  those  instructions  which  must  have  long  addresses  in  a 
fixed  piece  of  source  code.  This  optimization  could  be  performed,  say,  by 
an  assembler  for  such  a machine.  It  is  clear  that  such  an  optimum  uniquely 
exists:  if  the  arrangement  of  instructions  and  data  is  fixed,  making  an 
instruction  unnecessarily  long  cannot  serve  to  shorten  any  others,  although 
it  may  in  fact  force  others  to  be  long.  Indeed,  Figure  2 shows  a case  in 
which  two  instructions  are  arranged  so  that  both  must  be  short  or  both  must 
be  long.  This  is  why  the  initial  distance  is  defined  with  all  instructions 
short--the  following  algorithm  will  not  expand  such  situations  unnecessary, 
but  detecting  them  in  order  to  collapse  them  could  be  difficult. 
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Figure  2:  (Recall  £ is  maximum  short  address) 

Theoran  1 A segment  of  fixed  code  may  be  optimized  for  short/ long  addresses 
in  time  proportional  to  n*£,  where  n is  the  length  of  the  code  and  £ is 
maximum  short  address. 

A simple  algorithm  tdaich  almost  suffices  is  to  make  repeated  passes 
over  the  code,  expanding  on  each  pass  only  those  instructions  vtfiich  require 
it,  terminating  after  a pass  in  which  no  instructions  are  expanded.  However, 
this  may  require  about  n/£  passes  thru  the  program  if  instructions  are 
chained  as  in  the  pathological  case  of  Figure  3,  where  expanding  6 forces 


Y to  expand  which  forces  0 to  expand  ... 


Figure  3 

Such  pathological  cases  require  a better  algorithm  for  theoretic  reasons; 

but  practicality  also  calls  for  a better  algorithm  not  requiring  many 

passes  over  the  entire  program,  which  is  very  likely  stored  out  on  a 

disk  file.  A number  of  systems,  however,  have  made  use  of  this  multi-pass 

algorithm.  The  BLISS  compiler  uses  this  technique,  along  with  others,  to 

optimize  branch  instructions  on  PDP-11  [T3,  p.  119].  Interdata's  CAL 

assembler  will  make  multiple  passes  if  the  SQULZ  option  is  specified, 

although  the  number  of  passes  is  limited  by  a parameter  on  the  option  and 

assuming  an  initial  long  format  thus  missing  cases  as  in  Figure  2 | 5,  p.  53]. 

The  UNIX  assembler  does  three  passes,  also  beginning  with  an  assumed  long  format. 

Linear  algebraic  solutions,  of  high  computational  cost,  are  presented  in  [2  and 

2 

9].  Szymanski  [12]  gives  an  algorithm  which  is  similar  in  flavor  but  n in  time. 

The  pathology  of  Figure  3 could,  of  course,  be  handled  by  a single 
right-to-left  pass.  Iloi  ever,  the  pathology  can  be  elaborated  to  confound 
an  alternating-pass  scheme  as  well.  The  technique  requires  a single  pass 
over  the  program  to  gather  dependency  information  (called  "initialization" 
below)  and  a single  traversal  through  a generated  data  structure  in  a 
method  reminiscent  of  topological  sort  [ 8] . 


ALGORITHM  1 


data  structures: 
for  each  label  a, 

a count  C[a]  and  a list  of  labels  L[a]; 
a queue  Q; 

/* initial izat ion*/ 

for  each  label  a,  such  that  a: 6, 

{set  CJy]  •*-  d(a:  8) ; 

for  each  y:6  such  that  d(v,<5)  _f_  £ which  is 
dependent  on  a,  place  y on  L[a]; 
if  C[a]  > £ then  place  a on  Q 
}; 

/*inain  loop  - marking  instructions  as  long  or  short*/ 

for  each  a on  Q 
{ output  a ; 
for  each  y on  L[a] 

{CM  - c[y]+i; 

if  C [ y ] = £+1  then  place  y on  Q 

} }. 

The  output  from  this  algorithm  is  the  set  of  labels  of  instructions  which 
must  be  in  long  format.  The  desired  time  bound  is  obtained  on  the  main  loop 
by  observing  that  there  are  at  most  n labels  and  that  L[n]  has  at  most 
2 £ elements  because  of  the  test  d(a,<5)  <_  £ in  the  initialization  loop. 

The  initialization  may  also  be  implemented  in  a way  which  is  linear  in  n-£, 
by  only  searching  for  y within  £ of  a on  a single  pass. 

The  above  algorithm  (or  sketch)  has  the  disadvantage  of  producing  a 
potential  enormous  data -structure,  with  order  of  n-£  entries.  However, 
this  sketch  may  be  fleshed-out  into  a fairly  efficient  program.  Rather  than 
making  initialization  and  marking  distinct  phases,  they  are  combined  in  one 
pass.  This  pass  is  really  an  elaboration  of  the  initialization,  with  marking 
done  "on  the  fly".  Fairly  simple  heuristics  - such  as  checking  whether  an 
instruction  is  unquestionably  long  or  short  - are  used  to  greatly  reduce  the 
number  of  labels  for  which  lists  must  be  created  and  the  number  of  entries 
on  these  lists.  Moreover,  elements  are  removed  from  these  lists  at  the  first 
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opportunity,  reducing  the  intermediate  storage  required.  A complete  algorithm 
incorporating  these  heuristics  is  given  in  the  appendix. 

.An  algorithm  based  on  similar  ideas  is  given  by  Szymanski  [12].  He  also 
notes  that  the  algor ithm  will  not  work  if  operands  of  the  form 

label  + constant 

are  allowed.  This  is  because  lengthing  one  instruction  may  cause  another 
to  be  closer  to  rather  than  farther  from  its  operand.  This  happens  in  the 
pathological  case  wrhere  the  addressed  operand  is  on  the  other  side  of  an 
instruction  from  the  label  specified  in  the  operand  - as  in  Figure  4 where 
lengthening  y causes  d(a,B)  to  increase  but  moves  the  operand  at  6 + v 
closer  to  a.  Indeed,  Szymanski  shows  that  problem  of  optimizing  code  with 
such  pathologies  is  wp-complete.  To  call  a practice  such  as  this  "bad  coding" 
is  certainly  a severe  understatement  and  any  assembler  should  flag  such 
usage  as  an  error.  This  does  point-out,  however,  the  failure  in  this  case 
of  the  common  assembler  convention  that  the  difference  of  two  relocatable 
terms  is  an  absolute  term. 


By  a 


III.  Optimization  with  Storage  Allocation  is  Hard 

Knowing  that  it  is  easy  to  optimize  a program  if  no  rearrangement  of 
storage  is  possible,  it  is  natural  to  ask  whether  rearranging  storage  may 
produce  a shorter  program  and  how  difficult  it  is  to  perform  such  rearrange- 
ments optimally.  The  answer  is,  of  course,  that  rearrangements  may  shorten 
a program  but  that  finding  an  optimal  arrangement  is  difficult.  There  are 
situations  where  this  optimization  is  inappropriate  or  undesirable.  Placing 
data  and  code  together  violates  the  tenents  of  "pure"  code,  making  reentrant/ 
recursive  programming  difficult  to  impossible.  There  are,  however,  occasions 
where  it  is  highly  appropriate,  as  in  a compiler  for  a systems  programming 
language  for  a single-user  minicomputer  system  [4  ,10]. 

The  situation  we  will  consider  is  as  above  except  that  there  are  certain 
(one-word)  variables  whose  location  in  the  program  body  is  not  fixed  a priori 
and  certain  places  in  memory  (for  example,  following  an  unconditional  branch) 
where  these  variables  may  be  inserted.  If  a variable  is  referenced  in  only 
one  brief  segment  of  code,  placing  it  near  that  segment  could  result  in  all 
references  to  it  being  short;  placing  it,  say,  at  the  end  of  the  program  would 
require  all  references  to  be  long.  In  our  graphical  notation,  a reference 
to  such  a variable  will  be  indicated  by  an  arrow  pointing  directly  at  a label 
(rather  than  a cell)  and  a slot  where  variables  may  be  inserted  will  be 
indicated  by  a heavy  vertical  bar  (see  Fig.  5).  Also,  a number  m in  a cell 
will  denote  m adjacent  occurrences  of  instructions  with  the  same  operand. 

The  problem  of  finding  the  optimal  placement  of  variables  in  slots  is 
shown  to  be  hard  is  a sense  which  has  already  become  classic  in  its  brief 
lifetime:  the  problem  is  polynomially  complete  in  np  or,  more  briefly, 
/vp-complete.  The  wp-complete  problems  include  many  from  combinatorics, 
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operations  research,  etc.  whose  best -known  algorithms  require  exponential 


time.  They  are  all  related  in  that  discovery  a polynomial -time  algorithm 
for  any  of  the  problems  would  provide  a polynomial  time  solution  to  all  of 
them,  and  hence  such  an  efficient  algorithm  is  highly  unlikely  [1,7]. 

In  order  to  express  this  notion  more  exactly  we  define  a problem 
<S,P)  to  be  a set  S (of  instances  of  the  problem)  and  a predicate  P 
defined  on  the  members  of  S.  The  set  of  the  code  generation  problem 
(designated  S/h-CODF.)  has  instances  of  the  form  < prgm,Z,k>,  where  prgm 
is  a source  program  and  l and  k are  integers.  The  predicate  for  code- 
generation (designated  FITS)  is  then  true  if  prgm  may  be  translated  into  an 
object  program  with  short  address  span  l which  requires  k or  fewer 

instructions  in  long  format.  An  optimal  object  program  then  requires 
the  minimun  k such  that  the  predicate  is  true  of  <prgm,2.,k>.  A 
problem  <S,P>  is  said  to  be  (VP-complete  provided  that  1)  for  s eS, 
it  may  be  nondeterministically  (by  guessing)  discovered  that  P(s)  is 
true  in  time  polynomial  in  the  size  of  s , and  2)  for  some  fixed  pro- 
blem <L,T>,  from  a known  class  of  problems,  there  is  a translation 
tr :L-*-S  which  operates  deterministically  in  polynomial  time,  such  that, 
for  each  x e L,  P(tr(x))  iff  T(x).  The  nondeterministic  solution  to  the 
code  generation  problem,  for  an  instance  < prgjn, 2,,k>,  first  guesses  the 
placement  of  variables,  and  then  produces  object  code  according  to 
Algorithm  1,  checking  whether  the  resulting  object  code  does  indeed  fit 
in  k wrds.  Thus  short/ long  code  generation  satisfies  the  first  criterion. 

In  order  to  show  the  second  criterion,  we  need  a fixed  problem  for 
reference,  and  we  choose  <CNF-3,SAT>,  where  CNF  is  the  set  of  all  for- 
mulas of  the  propositional  calculus  which  are  in  Conjunctive  Normal  Form 
wath  at  most  3 literals  per  clause.  That  is  a formula  in  CNF- 3 is  of  the 
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form  C. . . . fjC  , where  each  C.  is  a clause  of  the  form  A,  vA-,vA, 

1 l m 1 1 l 3 

(or  A^  or  A^vA^) , where  A^  is  a literal,  that  is  either  IV  or  V for 
some  variable  V.  F e CNF-3  is  satisfiable  (SAT(F)  is  true)  if  there  is 
an  assignment  of  truth  values  to  the  variables  of  F such  that 

F is  true.  This  problem  is  known  to  be  wp-complete  [7]. 

We  take  advantage  of  the  fact  that  the  conjuncts  (Vv~TW)  and  PVvW) 
appearing  in  F force  V and  W to  have  the  same  value  in  any  assignment 
of  values  satisfying  F.  Thus  if  a variable  appears  in  more  than  five 
conjuncts,  we  may  replace  three  occurrences  of  V by  some  new  variable  W 
and  introduce  the  new  conjuncts  (Vv~IW)  and  pVvW) . This  process  is 
iterated  until  no  variable  appears  in  more  than  five  conjuncts.  The  resulting 
formula  is  satisfiable  iff  the  original  one  was.  It  is  no  more  than  three 
times  the  length  of  the  original.  Thus,  without  loss  of  generality,  wc 
assume  that  each  formula  of  CNF-3  has  no  variable  appearing  in  more  than 
five  conjuncts.  Also  we  may  easily  assume  that  both  a variable  and  its 
negation  do  not  appear  in  the  same  conjunct. 

Theorem  2.  The  problem  < S/L-CODE,FITS>  is  wp-complete.  Moreover,  we  may 
restrict  the  maximum  short  address  length  i to  a fixed  size  and  the 
problem  is  still  np- complete. 

Proof:  From  the  remarks  above  it  suffices  to  show  a translation: 

CNF-3  ->  S/L-CODK  such  that  F e CNF-3  is  satisfiable  iff  the  corresponding 
program  fits  in  the  given  space. 

Pick  an  arbitrary  formula  F e CNF-3,  say  that  F is  C^C7P . .^C  , 

where  each  conjunct  C.  is  A.  .vA.  ~vA.  T (or  A.  , or  A.  ,vA. 

J J J,2  j ,3  j,l  J,1  j ,2 

which  are  treated  similarly  and  thus  will  not  be  distinguished).  The 
variables  of  F are  V-.V-,...^  , and  for  each  V.,c.  ,,c.  c.  - 

I Lt  fl  ] 1 , j J p tm  1 | •) 


♦ 


\ 
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(possibly  fewer)  are  the  ccnjuncts  in  which  appears . We  first  con- 

struct the  program,  which  actually  will  be  built-up  of  independent 
"modules"- -tWD  modules  for  each  and  one  for  each  Ch . The  position 

of  these  modules  does  not  influence  the  size  of  the  object  code  in  any 
way. 

Figures  5 shows  one  of  the  modules  (the  f-module)  corresponding  to 

variable  V^.  The  other  module  (the  t-module)  is  exactly  the  same  except  the 

occurrences  of  a.  • and  cT.  . are  reversed.  Note  that  the  distances 
> J > J 

shown  are  initial  distances  and  that  2,-1  is  the  initial  distance  from 
the  rightmost  of  the  three  instructions  pointing  to  to  the  slot  on 
the  left. 


£ — > 
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vhere  t is  a.  „ if  V.  is  a literal  of  C 
r i c . i c . 

_ ’ i,r  i,r 

ai,c.  if  IV.  is  a literal  of  C 
’ i,r  l c. 

i,r 

Figure  5 

Since  is  referenced  only  hy  the  f-  and  t -modules  of  and  since 

it  may  be  placed  in  the  slot  of  either  module  without  additional  cost  to 
that  module,  must  be  placed  in  cne  of  these  modules  in  any  optimal 


t 


arrangement.  We  shall  see  that  the  placement  of  in  's  t-module 

corresponds  to  assigning  the  truth  value  true  to  V. , and  similarly  the 

f- module  corresponds  to  false . If  in  is  placed  in,  say,  the  f-module 

thoi  the  variables  labelled  on  • or  5.  • may  be  placed  in  the  same 

slot  so  that  all  references  to  these  variables  from  the  module  are  short. 

In  any  optimal  arrangement,  the  variables  labelled  a.  . or  a.  . must 

i.J  i»  J 

be  placed  in  the  f-module  if  such  is  the  case,  as  will  be  shown  later.  On 
the  other  hand,  if  in  is  again  placed  in  the  f-module,  then  all  instructions 
referencing  it  from  the  t-module  must  use  long  addresses.  Thus  16  addi- 
tional words  are  introduced  between  the  slot  of  the  t-module  and  the 
instructions  referencing  a.  . or  a.  . , forcing  all  the  corresponding 
addresses  to  be  long  in  any  case.  Since  the  two  modules  corresponding  to 
Vf.  are  symmetric,  the  final  conbined  length  of  these  modules  in  an  optimal 
arrangement  is  independent  of  the  placement  of  in  . 

New  we  turn  to  the  module  (c-module)  corresponding  to  conjunct  C. 
which  is  shewn  in  Figure  6,  with  Ch  equal  to  A^vA^A.^.  First 


where  6r  is  ^ if  Aj.  is 

Br  13  “i.J  if  \ is_,Vi 

and  Br  is  similar  with  bars  reversed 


Figure  6 


observe  that  any  a.  . or  i.  . is  referenced  once  in  a c -module  but  three 
»,J  i,J 

times  in  a t-  or  f-module.  Moreover,  omitting  a.  . from  a c-module  can 

cause  at  most  one  additional  instruct  ion,  y^,  to  expand,  Thus,  if  possible, 

the  variables  labelled  « and  a-  must  be  placed  in  a t-  or  f-module 

i.J  i.J 

for  the  arrangement  to  be  optimal . We  have  seen  that  this  only  occurs  if 

the  variable  v-  is  placed  in  the  f-  or  t -module,  respectively.  Since 

a-  • appears  in  the  f-module  of  V.  iff  a.  . appears  in  the  t-module  land 
i.J  i i,J 

V.  is  in  Cl,  exactly  one  of  the  variables  a-  • or  a.  . is  available 
i J i,)  i*J 

for  placement  in  the  c-module  of  C;.  Thus,  exactly  one  of  the  corresponding 

or  8^  is  long  and  the  final  length  of  the  c-module  in  an  optimal 

arrangmenet  is  fixed  except  for  the  cell  y j . 

Again,  consider  the  case  that  v.  is  placed  in  the  t-module  of  VL, 

and  say  V.  is  a literal  of  C..  Then,  as  noted  above,  a.  . is  available 

for  placement  in  the  c-module  of  Cj . Thus  d(yj,6j)  <_  l and  hence  Yj 

may  be  short.  If,  however,  none  of  the  variables  corresponding  to  Pj, 

or  6,  for  C.  are  available  for  placement  in  the  c-module,  the 
^ J 

d ( y j > 6 j ) = £+l  and  Yj  must  be  long.  As  noted  above,  placement  of 
in  V.'s  t-module  corresponds  to  choosing  true  for  V^;  and  after 
all  these  assignments  are  made,  6j  is  short  iff  Cj  is  true  and  F 
is  satisfied  iff  all  6j  are  short.  Let  prgm  be  the  program  constructed 
and  let 


k = 16-n 


(for  v's) 


m 


♦ 4*  l (number  of  literals  in  C.)  (for  a's  and  B’sl. 

j-1  J 

Then  SAT(F)  iff  F ITS (prgm, I, k) . 

Inspection  of  Figures  5 and  6 will  show  that  an  t of  31 (=  2^-1) 
is  sufficient.  □ 
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One  may  suspect  that  the  constraint  that  variables  can  only  be 
placed  in  specified  slots  makes  the  problem  difficult.  However,  this  is 
not  the  case.  We  define  two  variations  on  the  rules  governing  the  arrange- 
ment of  program.  FITSa  (FITS  with  arbitrary  placement)  is  true  of 
<prgm,  i ,k>  if  the  object  program  may  be  arranged  as  before  except  that 
variables  may  be  inserted  anywhere  in  the  program  text,  and  the  resulting 
program  requires  k or  fewer  words  in  long  format.  Realistically  we 
should  charge  an  extra  word  for  branching  around  the  inserted  variable, 
but  it  is  not  necessary  in  the  following  result.  It  will  be  necessary  to 
charge  for  branching  in  FITSb  (FITS  with  branching)  where  the  code  may  be 
rearranged  to  bring  it  near  the  variables  which  are  referenced.  Figure  7 
illustrates  a situation  where  this  may  be  an  advantage.  The  blocks  labelled 


< > 2 1 > 


Figure  7 


a and  S both  reference  the  variable  u , but  they  are  too  far  apart 
for  u to  be  referenced  by  short  instructions  for  both.  However,  these 
blocks  could  be  rearranged  as  in  Figure  8,  with  br  indicating  branch 
instructions.  FITSb  (<prgn,«,,k>)  is  true  if  prgm  fits  in  k words  of 
object  code  with  rearrangements  as  indicated,  provided  that  the  cost  of 
branch  instruction  is  also  accounted  for. 
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Figure  8 


Theorem  3 The  problem  <S/L-C0DE , FITSa>  is  wp-complete. 

Proof : The  proof  is  similar  to  that  of  Theorem  2,  except  that  the  modules 
corresponding  to  the  (Fig.  5)  must  be  modified.  This  modification  is 
essentially  accomplished  by  mirroring  the  module  about  the  slot,  as  shown 
in  Figure  9.  In  addition,  each  reference  to  8r  or  6r  in  a c-module 
(Fig.  6)  is  doubled. 


->2?r-> 
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where  x.  is  a.  , if  V.  is  a literal  of  C 

j i.j  i 'j 

a.  if  -)V  is  a literal  of  C 
i.j  i c. 


Figure  9 


As  before,  placing  in  the  t-module  of  Vh  corresponds  to 
assigning  true  to  V^,  etc.  If  u.^  is  placed  in  the  center  of  the 
module,  then  each  Tj  may  also  be  placed  in  the  center  of  the  module 
in  such  a way  that  it  may  be  addressed  with  short  instructions  from 
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either  end,  thus  "saving"  four  words.  If  the  x.  is  placed  in  its 
corresponding  c-module,  however,  at  most  three  words  can  be  saved,  thus 
forcing  the  placement  in  the  t-  or  f- module  if  possible. 

If,  on  the  other  hand,  xu  is  not  placed  in,  say,  a t-module,  then 
2l  words  will  separate  the  instructions  referencing  Xj  from  the  left 
and  ri$xt  ends  of  the  module.  Thus  placing  x.  in  the  t-module  will 
save  cnly  two  words,  while  at  least  two  words  will  be  saved  by  placing 
Xj  in  the  corresponding  c-module.  Thus,  although  the  placement  of  x. 
is  not  forced  in  this  case,  its  placement  in  the  c-module  is  no  worse. 

The  rest  of  the  analysis  follows  as  in  Theorem  2.  □ 

Theorem  4 The  problem  < S/L- CODE , FITSb  > is  polyncmially  complete  in  NP. 
Proof:  The  proof  is  essentially  that  of  Theorem  3,  except  that  two  in- 
structions referencing  addresses  local  to  the  module  are  inserted  between 
every  reference  to  a variable  u. , a.  • , etc.  This  forces  the  entire 
module  to  remain  together,  since  moving  a segment  containing  intra-module 
references  as  wall  as  references  to  variables  costs  two  words  in  intra- 
module addresses  for  every  word  saved  addressing  a variable,  and  since  only 
moving  references  to  variables  costs  two  branch  instructions  (at  least  two 
words)  for  every  reference. 

Actually  some  care  must  be  taken  in  constructing  the  intra-module 
references.  For  example,  if  all  these  references  pointed  to  just  one 
address,  that  address  could  be  picked  up  and  moved  as  well.  However, 
judicious  construction,  chaining  intra-module  references,  can  solve  this 

problem.  It  is  sufficiently  a bookkeeping  detail  that  it  will  not  be 
further  elaborated  on.  0 
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Althou$i  negative  results  from  computer  science  theory  are  usually 
received  with  distain  or  dismay,  the  last  two  theorems  should  be  cause 
for  some  rejoicing.  Most  computer  programmers  have  already  had  sufficient 
headaches  dealing  with  "optimizing"  compilers  that  they  would  be  most 
dismayed  at  an  algorithm  giving  compilers  license  to  rearrange  their  code 
(even  with  branch  statements  inserted) . 


The  author's  edition  of  Webster's  New  World  Dictionary  (World  Pi±>.  , 1957) 
gives  "to  be  given  to  [or]  to  treat  with  optimism"  as  the  only  definition 
for  "optimize." 
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IV.  APPROXIMATE  SOLUTIONS 

When  we  discover  that  a particular  problem  class  is 
np- complete,  or  in  some  other  way  difficult  to  solve,  it  is  natural 

to  ask  whether  there  is  some  algorithm  which  provides  an  approximate  solu- 
tion ( 3,111  • Certain  problems,  such  as  satisfiability  of  a Boolean  formula, 
have  only  "true"  or  "false"  as  answers  and  hence  do  not  achiit  approximate 
solutions.  However,  if  a problem  is,  say,  one  of  minimisation  it  is 
reasonable  to  ask  how  close  can  we  come  to  the  minimum  with  an  algorithm 
which  is  "inexpensive"  to  run.  In  such  a case,  we  are  interested  in  the 
relative  rather  than  the  absolute  "error".  In  particular,  a problan  is 
said  to  have  an  e -approximate  algorithm  (e  >0)  if,  for  each  instance  of 


the  problem 


$A"$0 


X 


< e 


where  is  the  cost  of  the  solution  provided  by  the  algorithm  and 

is  the  true  optimal  solution.  Since  we  will  concern  ourselves  only  with 
minimization  of  positive  quantities,  the  above  inequality  is  equivalent  to 

Theorem  4.  For  any  e > 0 the  problem  of  finding  e- approximate  solutions 
to  < S/L-CODE,FITS>  is  wp-complete.  Moreover,  i maybe  fixed. 

Proof : Once  again  the  proof  is  by  reduction  to  CNF- 3 with  the  same  restric- 
tions as  before. 

Say  e > 0 is  fixed  and  we  are  given  a formula  F as  above. 

As  before,  we  constrict  a t-module  and  an  f-module  (Figure  101  for 
each  variable  and  a c-module  for  each  clause  Cj  (Figure  11). 
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Whereas  these  modules  had  bet  independent  in  previous  proofs,  they  will 
now  be  linked  together  such  th.  certain  instructions  in  each  module 
reference  operands  in  the  module  immediately  to  the  left.  The  order  in 
which  the  modules  are  arranged  is  of  no  consequence,  but  it  is  important 
that  they  are  linked  properly.  In  particular,  the  initial  instruction- 
operand  distance  of  each  inter-module  instruction  is  exactly  l - this 
condition  is  achieved  by  having  a sufficient  number  of  labels  (maximum  32) 
in  each  module  and  by  providing  sufficient  "filler"  between  modules.  The 
extra-module  instructions  of  the  leftmost  module  are  replaced  by  no-ops 
and  a special  module  will  be  placed  on  the  right  of  the  entire  assemblage. 


■ Z-31 


where  r is  a.  if 

r 1 ,ci ,r 

V. 

l 

is  a literal  of  C 

Ci,r 

a.  if 

1,Ci»r 

> 

r 

is  a literal  of  C 

c. 

the  0's  are  unique  to  each  module 


Figure  10 
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To  analyze  the  effect  of  these  modules  first  observe  that  each  of  them, 
whether  t-,  f-,  or  c-module,  has  one  instruction  (the  check  instruction) 
at  the  vciy  right  (except  intermodule  "filler")  whose  operand  is  at  the  very 
left.  Certain  conditions  on  the  assignment  of  operands  to  a module  must 
he  met  or  the  check  instruction  will  be  forced  long.  In  particular,  the 
check  instruction  of  a module  must  always  be  long  if  all  instructions  with 
intermodule  references  are  long.  This  is  true  because,  in  each  module,  the 
initial  distance  of  the  check  instruction  plus  the  number  of  extra-module 
references  in  2 + 1 , and  because  all  extra-module  references  come  from 
within  the  span  of  the  check  instruction.  Moreover,  since  the  check  instruc- 
tion is  with  the  span  of  all  references  from  the  module  to  the  right,  and 
since  each  of  these  references  from  the  right  has  initial  distance  2, 
making  the  check  instruction  of  one  module  long  forces  the  check  instruction 
of  the  right  neighbor  to  be  long  as  well.  In  this  way  lengthening  one 
check  instruction  propagates  to  lengthening  the  rightmost  check  instruction. 

Consider  now  the  f-module  for  VC,  and  assume  the  check  instruction  of 
this  module  has  not  already  been  forced  long  by  a module  to  the  left.  This 
check  instruction  may  remain  short  if  either  the  operand  v-  or  the  operands 
corresponding  to  each  of  the  xr's  are  assigned  to  the  module.  Since  both 
the  t-module  and  the  f-module  of  VC  reference  v.,  and  since  the  inter- 
module filler  of  length  2 prevents  from  being  near  both  modules,  one 

of  the  modules  must  be  assigned  v.  while  the  other  its  corresponding  tr's. 
Say  is  assigned  to  the  f-module,  then  cu  ^ is  available  for  the 

c-module  of  any  in  which  \C  appears  as  a literal.  And  the  c-module 

for  Cj  must  have  assigned  to  it  at  least  one  operand  corresponding  to  a 
Br,  or  its  check  instruction  will  be  forced  long. 
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To  sum  up  the  construction  so  far,  the  assignment  of  to  the 

corresponding  t-  on  f-module  corresponds  to  an  assignment  of  true  or 
false,  respectively,  to  Vi,  and  the  truth  assignment  satisfies  the  formula 
iff  no  check  instruction  is  forced  long.  If  the  formula  is  unsatisfiable, 
then  the  rightmost  check  instruction  must  be  long. 

Finally  we  a block  of  q instructions (the  exact  value  of  q is 
given  later)  labelled  Yj  ,Y->.  • ■ • ,y^  (Figure  12).  If  is  the  indicated 

label  from  the  rightmost  module,  the  Yj  : 9-^j  and  enough  filler  is  provided 
so  that  di(Yj,0jj)  = fc.  The  l - 1 instructions  have  no 

operands  and  Yr  : Yr_}  for  9.  < r <_  q.  The  placement  and  construction  of 
this  block  is  such  that  Yj  is  forced  long  iff  the  rightmost  check  instruction 
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Figure  12 


is  forced  long;  and  forcing  Yj  long  forces  each  of  V;  + j , • • • ,'r({  to  he 
long  in  turn.  Thus  each  of  these  q - l y's  must  he  long  iff  F is  not 
satisfiable.  Moreover,  either  all  of  these  y's  must  be  long  or  none 
must  be. 

Now  say  the  constructed  program,  before  adding  the  y's,  has  p 
short/ long  instructions.  Then  let 

q > p(l  + e)  ♦ i . 

Now  assume  F is  satisfiable.  Then  an  optimal  solution  has  cost  which 
is  less  than  or  equal  p,  as  does  any  solution  which  does  not  force  the 
y's  to  be  long.  However,  if  the  approximate  solution  forces  the  y's  to 
be  long,  then 

$A  1 q - ^ > p(i  + e)  > $0(1  + e)  . 

Hence  any  e - approx ima te  algorithm  must  provide  a solution  of  cost  less  than 
an  equal  to  p if  the  orginal  F was  satisfiable.  And  this  indeed  would 
provide  a reduction  solving  3-CNF. 

Lixamining  Figure  10,  we  see  that  the  f-  and  t -modules,  which  are 
larger  than  the  c-modules,  may  have  an  initial  length  as  short  as  120, 
and  hence  an  fc  of  151  suffices.  □ 
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Because  of  the  fact  that  an  instruction  in  a short/long  address  machine 
can  he  expanded  at  most  from  one  word  to  two,  there  is  always  an  upper 
hound  of  1 on  the  ratio  |$^  - $o^$oJ  * ^ anc^  ^0  were  ta^en  as 
measuring  the  total  program  length  rather  than  the  number  of  instructions 
in  long-address  format.  Indeed,  this  is  the  very  motivation  for  counting 
long  instructions  rather  than  total  program  length.  Indeed,  if  we  modify 
the  predicate  FITS  so  that  FITSt(prgm, e.,k)  is  true  if  prgm  may  be  assembled 
in  k or  fewer  words  (total  program  length)  on  a machine  with  short  address 
span  of  l,  then  we  have: 

Theorem  5 For  any  e,  0 <_  c < 1,  the  problem  of  finding  -approximate 
solutions  to  < S/L-CODE,FITSt > , with  fixed  £,  is  WP-complete.  However, 
an  e -approximate  solution  for  e = 1 trivially  exists. 

Proof : Perform  the  construction  exactly  as  in  Theorem  4,  except  that  p is 
the  total  length  of  t-,  f-,  and  c-modules,  filler,  and  associated  variables  - 
that  is  everything  except  the  block  of  y's.  Now  add  a block  of  q y's 
where 

q > (2  + (1  + 2e)p)/(l  - e)  . 

Since  0<_e<l,  1-e  is  always  positive  so  this  restriction  is  equivalent 
to 

q > £ + p + e(2p  + q)  . 

Now  assume  F is  satisfiable.  Then 

$0  - 2P  + q • 

However,  if  an  approximate  solution  requires  the  y's  to  be  in  long  format, 
then 

$A  1 (q  - «•)  + p + q 

> p + e(2p  +q)+p+q^(l+  e)$Q  . 
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Hence  any  e -approximate  solution  for  a program  corresponding  to  a satisfiable 
1-  must  have  the  y's  in  short  format.  Again,  this  would  provide  a 
reduction  for  3 -CNF. 

As  noted  above,  <_  2-$Q  from  the  nature  of  the  problem.  The 
of  Theorem  4,  5.  = 151,  again  suffices.  □ 


-25- 


V.  Heuristic  Considerations 


The  previous  section  demonstrated  that  even  approximation  of  a rather 
broadly  constrained  version  of  the  problem  of  code  generation  was  NP-complete. 
The  implication  of  such  complexity  results  is  not  a purely  negative  one,  for 
it  directs  our  research  to  heuristics  and  approximations  which  are  likely 
to,  but  are  not  guaranteed  to,  provide  near-optimal  code.  In  this  section 
we  begin  this  attempt,  with  heuristics  for  the  placement  of  varibles  during 
compilation  or  assembly,  and  considerations  and  suggestions  for  future 
architecture,  language,  and  program  designers. 

Heuristics  for  the  placement  of  variables  can  be  exercised  by  the 
programmer  or  can  guide  manipulations  by  an  assembler  or  compiler.  The 
prime  consideration  for  a programmer  is  to  keep  logically  local  variables 
allocated  locally.  A common-place  technique,  which  is  certainly  economical 
in  FORTRAN  on  a large  machine,  is  to  declare  a few  temporary  variables  which 
are  used  repeatedly  throughout  a program- -using  few  words  instead  of  many. 
However,  in  machines  with  instructions  of  short/long  format  variation  this 
is  potentially  quite  wasteful.  Local  variables  should  be  used  as  much  as 
possible,  with  small  local  blocks  if  the  language  permits. 

Corollary  to  the  programmer  heuristic  for  local  variables  is  one 
recommending  placement  of  loop  indices  within/adjacent -to  loops.  Unless 
testing  is  performed  in  two  different  places,  a for-loop  structure  must 
have  an  unconditional  branch,  and  the  loop  index  may  be  placed  immediately 
following  this.  This  may  be  done  by  the  programmer  in  assembly  language 
or  by  the  compiler  in  a language  like  ALGOL  W (except  that  ALGOL  W is 
recursive)  in  which  a loop-index  is  implicitly  declared  for  the  body  of  the 
loop. 

Automatic  heuristics,  applied  by  an  assembler  or  compiler,  presuppose 
the  freedom  to  rearrange  storage.  The  natural  "greedy"  heuristic  is  to 
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count  the  number  of  times  each  variable  is  referenced  by  each  module  and  to 
associate  the  variable  with  the  module  which  references  it  most  frequently. 
Considerations  of  branch  instruction  are  omitted.  Not  only  are  "far 
away"  branches  relatively  unlikely  in  a well-written  program,  but  their 
impact  on  program  execution  time  (related  to  instruction  formats  because 
long  instructions  require  additional  time  or  complexity  of  instruction 
fetches)  is  even  further  restricted.  This  is  because  frequently  executed 
branches  are  in  tight-loop  situations  and  hence  are  nearby.  Exceptions  to 
the  above  are  subroutine  branches,  but  subroutines  in  this  context  should 
perhaps  be  treated  as  data  objects  whose  placement  is  optimal.  Moreover, 
the  "greedy"  heuristic  ignores  the  fact  that  the  placement  of  variables  will 
be  significant,  in  that  one  variable  placed  between  an  instruction  and  its 
operand  could  force  that  instruction  to  be  long  (as  in  sections  III  and  IV). 

In  general  this  is  not  worth  considering  for  simple  variables;  but  arrays 
and  other  structured  variables  should  be  weighted  by  the  number  of  references 
divided  by  size,  and  objects  of  highest  weight  placed  with  the  module  first. 
Itiis  is  intentionally  a vague  specification,  since  addressing  modes,  the 
definition  of  "module",  language  features,  etc.  will  strongly  influence  the 
actual  details.  It  is  easy  to  imagine  instances  where  this  heuristic  is 
arbitrarily  bad  (as  is  expected  from  section  IV).  However,  the  heuristic 
can  be  implemented  without  much  additional  overhead  as  part  of  a first  pass. 

If  we  wish  to  consider  branches  and  the  interactions  of  instructions, 
we  may  estimate  a cost  for  each  short/long  instruction  a:B  which  is  a 
function  of  all  y:6  dependent  on  <*:b  considering  a)  the  cost  of  >:6, 
b)  g.  - d(y : 6) , c)  a stage  in  an  iterative  approximation  of  the  cost  (since 
dependency  is  very  likely  to  be  cyclic).  A similar  cost  for  each  possible 
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placement  of  a data  object  could  be  computed  and  then  a modified  greedy 

heuristic  could  maximize  estimated  benefit/cost.  This  cost  estimation  could 

■> 

require  time  proportional  to  n“ , however. 

Any  automatic  heuristics  for  placement  of  data  require  language 
features  which  permit  movement  of  data  objects.  One  might  assume  that 
an  assembler  could  always  place  variables  after  unconditional  branches, 
but  this  could  wreak  havoc  in  the  middle  of  a branch  table  (or  in  other 
instances  of  calculated  addresses,  albeit  not  ones  appealing  to  current 
aesthetics).  Many  assembler  l;inguages  have  "literal  pool  here"  pseudo- 
operations, and  a similar  pseudo -operation  could  allow  the  programmer 
to  declare  sites  in  the  code  where  storage  for  variables  could  be  reserved. 
Scoping  might  present  some  problems,  but  there  is  no  reason  an  assembly 
language  cannot  be  block  structured.  Or  a notation  for  label  temporaries 
(comparable  to  the  ”3F"  and  ”7B"  labels  in  MIX  [ 8 ])  could  distinguish 
names  for  which  the  assembler  was  to  allocate  storage  from  those  the 
programmer  wished  to  control.  Higher-level  languages  of  course  free  the 
allocation  strategy  from  concern  about  explicitly  generated  addresses,  but 
for  those  which  dynamically  allocate  storage  or  force  program  and  data 
separation  these  considerations  are  of  course  irrelevant.  However,  a FORTRAN 
compiler  could  perform  optimization  not  only  with  user  variables  but  with  its 
own  temporaries  which  it  may  use  during  expressions  evaluation,  etc. 

Consideration  of  block  structured  but  non-recursive  languages  points- 
up  what  may  be  as  much  a deficiency  in  machine  architecture  as  in  language. 
With  block-structure  the  programmer  has  considerable  ability  to  restrict 
scope  of  temporaries,  but  the  common  addressing  range  restricts  the 

useful  size  of  blocks  with  local  variables.  The  usual  block  structure 
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restricts  declarations  to  beginning  of  blocks,  and  it  is  natural  for  com- 
pilers to  allocate  storage  immediately  (and  necessary  for  a one-pass 
compiler).  This  means  that  data  references  may  never  make  use  of  the  forward 
range  of  short  instructions.  Three  solutions  to  this  seem  to  eXiSt.  One 
lies  with  the  compiler,  at  the  expense  of  considerable  complication  and 
perhaps  another  pass.  This  is  to  allocate  storage  in  the  middle  of  the 
block.  The  second  solution  is  architectural  and  not  out-of-the  question  on 
microprogrammed  machines.  This  is  to  have  short  data  references  in  the  range 
-2e  to  0,  perhaps  retaining  the  -£  to  +£  range  for  branches.  Finally, 
short  data  references  may  be  displacement  from  a base  address  in  a system 
or  progranmer  controlled  register. 

Another  observation  about  short/long  addresses  is  that  indirection 
(as  in  the  PDP8)  is  better  than  purely  an  extra  word  with  a long  address. 
Indirection  within  the  til  range  to  a word  containing  the  full  address  of 
a variable  allows  many  instructions  to  share  that  full  address.  If  a page 
0 or  other  common  segment  is  addressable  by  a short  instruction,  even  more 
sharing  is  possible.  Indeed  the  Prime  system  makes  use  of  such  a shared 
with  addresses  placed  at  link/load  time.  Optimisation  at  link/load 
time  is  an  interesting  area  for  general  consideration. 

Concern  about  execution  (i.e.,  fetch)  time  rather  than  merely  program 
size  has  been  mentioned  before  the  preceding  paragraphs,  but  it  is  fitting 
that  this  section  should  end  stressing  this  point.  It  would  be  quite  foolish, 
in  execution  cost,  for  an  allocation  strategy  to  place  a variable  with  a 
routine  executed  only  on  rare  exceptions  rather  than  with  a major  loop. 
Therefore,  it  is  necessary  either  to  predict  probabilities  of  program  flow 
based  on  reasonable  programming  style  or  to  provide  facilities  for  the 
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programmer  to  indicate  allocation  preferences  (cf.  the  FREQUENCY  statement 
of  early  FORTRAN).  There  may  even  be  a time-space  tradeoff,  where  constants 
or  modules  a replicated  to  shorten  instructions  and  thus  save  execution 
time,  but  where  the  replication  uses  more  space  than  is  saved  by  short 
instruction  formats. 
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APPENDIX 


The  algorithm  given  in  this  appendix  is  a more  complete  version  of 
the  algorithm  given  in  section  II.  The  algorithm  for  deciding  whether  a 
particular  instruction  is  long  or  short  format  is  presented  in  detail. 

It  is  actually  to  be  incorporated  in  the  first  pass  of  an  assembler, 
handling  instructions  after  they  have  been  processed  by  the  scanner, 
symbols  entered  into  the  table,  etc.  For  sinplicity,  the  algorithm  assumes 
that  all  assembly  language  instructions  are  of  the  same  form  — single 
operand  with  the  option  of  either  short  or  long  format  machine  represen- 
tations . 

The  critical  idea  for  the  heuristics  of  the  algorithm  is  to  determine 
inmediately , if  possible,  whether  an  instruction  must  be  short  or  long  for- 
mat independant  of  the  formats  of  all  other  instructions  --  which  we  call 
"certainly"  long  or  short.  An  instruction  is  "certainly  long"  if  at  least 
£ words  of  source  program  (generally  assumed  to  be  in  short  format) 
intervene  between  the  instruction  and  its  operand.  On  the  other  hand,  an 
instruction  is  "certainly  short"  if  the  mrnber  of  intervening  words  plus 
the  number  of  intervening  instructions  vhose  format  is  yet  unknown  (and 
vhich,  therefore,  could  expand  requiring  an  extra  word)  is  less  than  l.  It 
is,  of  course,  easy  to  test  these  conditions  for  instructions  which  reference 
backwards , but  a forward  reference  requires  an  t word  "look-ahead." 

The  "look-ahead"  is  actually  accomplished  by  l instruction  buffers  (see 
Figure  13).  The  buffers  are  l instructions  rather  than  merely  t words  in 
length  in  order  to  facilitate  synchronization  of  the  processes  filling  and 
errp tying  the  buffers.  Backward  reference  instructions  are  classified  as 
certainly  long  or  short  by  ENTER. RIGHT,  which  accepts  all  input  and  fills  the 
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4 Motion  of  cc3  thru  buffers  <r 


ENTER- RIGHT  accepts  inputs  from  scanner  and  fills  BR 

MIDDLE  enpties  BR  and  fills  BL 

EXIT -LEFT  empties  BL  and  writes  output 

Figure  13:  Buffers  and  procedures  for  Algorithm  II 

right  buffer  BR.  The  procedure  MIDDLE  takes  input  off  BR  and  places 
instructions  on  BL,  the  left  buffer,  after  processing.  Forward  reference 
instructions  can  be  classified  certainly  lcng/ short  by  MIDDLE,  since  if  the 
operand  is  not  among  the  l instructions  in  BR  then  the  format  is  certainly  long. 
MIDDLE  also  places  all  backward  reference  instructions,  except  those  whose  format 
is  known  based  on  the  information  in  BL,  into  the  list  structures  corresponding 
to  those  set  up  during  the  initialization  phase  of  Algorithm  I (this  is 
part  of  the  initialization  "on  the  fly").  Deferring  the  entering  of  these 
instructions  until  this  point  means  that  all  certainly  long/short  instruc- 
tions in  their  scope  have  been  discowred.  For  a similar  reason,  forward 
reference  instructions  are  not  entered  on  to  the  lists  until  they  are  re- 
moved from  BL  by  EXIT-LEFT. 

Since  the  length  of  generated  code  appears  to  increase  as  more  instruc- 
tions are  forced  to  expand  to  long  format,  each  of  the  three  procedures 
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ENTER- RIGHT,  MIDDLE,  and  EXIT-LEFT  keeps  its  own  location  cointer  (at  least 
potentially,  the  location  counter  is  not  explicitly  kept  in  EXIT-LEFT,  but 
is  kept  in  ENTER -RIGHT  and  MIDDIE  as  PR  and  PM  respectively).  By  biasing 
these  location  cointers  by  distinct  multiples  of  100000  (presumed  to  be 
longer  than  l or  the  length  of  any  generated  code) , it  is  possible  to  make 
canparisons  which  test  the  relative  position  of  two  labels  even  though  they 
have  been  assigned  positions  by  different  procedures. 


where  label  gets  position  value  bias 

undefined  300000 

ENTER- RIGHT  200000 

MIDDLE  100000 

EXIT-LEFT  0 

Table  2 

For  example,  the  test 

P[a]  <P[B] 


is  actually  a combination  of  two  tests:  has  a passed  a "station”  in 
the  buffers  which  8 has  not,  or  have  they  both  passed  the  same  station 
with  a preceding  8 . Of  course  it  is  occas s icnal ly  necessary  to  ensure 
the  same  bias  is  used  vhen  con-paring  two  positions  — hence  the  two  different 
times  when  P and  N are  assigned  values  in  the  procedure  MIDDLE,  imnediately 
in  the  case  of  a backward  reference  but  not  until  the  completion  of  processing 
a forward  reference. 

We  now  give  the  algorithm  in  a slightly  unconventional  but  hopefully 
understandable  notation.  In  particular,  manipulations  of  lists,  buffers, 
files,  queues,  and  records  (the  preferred  implementation  of  C,D,P,L[o])  are 
largely  descriptive. 
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data  structures 


/*  = = ==  = 


= = = = */ 


buffer  BR,  BL  for  H instructions; 


for  each  label  c { 

integer  C[  c ] initially 
integer  D[  c ] initially 
integer  P[ c j initially 
list  L[ e ] initially 


0 /*  count  */; 

0 /*  distance  •/; 

300000  /*  position  ♦/; 
null  ) ; 


/*  PR  and  PS  are  position  estiiates  for  right  and  Biddle, 
respetively,  biased  by  their  initial  values.  Undefined 
position,  in  P[  • ],  is  300000  */ 

inteqer  PR  initially  200000; 
integer  PS  initially  100000; 


/*  IB  and  MS  count  the  nuaber  of  yet  unaarked  instructions 
(i. e.  instructions  for  which  the  required  length  ha3  not 
bean  determined)  at  the  right  and  tiddle  respectively  */ 

Integer  MB,  MS  initially  0; 
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/*  ====  spec ia led  procedures  ====  ♦ / 

Boolean  procedure  L0NG(9,(>) 

/*  returns  true  IFF  is  certainly  long 

from  currently  known  information*/ 

IPf»]  * ?!♦]!  > 


Boolean  procedure  SHORT  (0, ♦) 

/*  returns  true  IPP  is  certainly  short  */ 

IP[»]  * P[*]l  ♦ |N[8]  - «(♦]!  < 


procedure  LONG-DELIST  (") 

/♦corresponds  to  the  main  loop  of  algorithm  I.  Ti 
instruction  labelled  c is  made  long  ("marked")  and  th 
bookkeeping  of  all  y:S  which  hare  c in  their  scope  is 
updated.  If  this  forces  K:S  to  be  long,  then  the 
procedure  is  repeated  with  y,  etc.*/ 

( queue  0; 
put  " on  Q; 
for  each  y on  Q 

(mark  y as  long; 
for  each  1 on  L[  Y ] 

(D[1]  <-  D[f]  ♦ 1;  cm  «-  C[f  J - 1; 
if  D[  f 1 = jf  ♦ d then  place  H on  Q) 

t } ; 


procedure  SHDRT*D3LIST(*) 

/♦similar  to  the  main  loop  of  algorithm  I and  the 
procedure  LONG*  DEL  1ST,  ercept  the  instruction  labelled  i 
is  made  short  */ 

{ queue  0; 
put  m on  Q; 
for  each  Y on  a 

(mark  y as  short; 
for  each  1 on  L[K] 

c=mi  <-  cm  - i; 

if  C[t]  ♦ D[¥]  = / then  put  H on  3) 

); 


BEST  AVAILABLE  COPY 
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wain  "coroutines" 


/* 


V 


procedure  ENTER*  RIGHT 

( t:p  gets  next  instruction  fro*  file; 

P[  x]  4-  PR;  N[  x ] 4-  NR; 

PR  4-  PP  ♦ 1; 

if  P[«l  - Pf  p ] > / /"only  backwards  ref.  sat'f  y*/ 
then  ( nark  c:(3  as  long;  PR  4-  PR  * 1| 
else  NR  4-  NR  ♦ 1; 
place  *:p  on  BR) ; 


procedure  MIDDLE 
( PM  4-  PM  ♦ 1; 

*;1  4-  next  instruction  fro*  BR;  place  c;3  on  BL; 
if  «:p  is  larked  long 
then  PH  4-  PH  ♦ 1 

else  if  Pfx  ] < Pf p ] /"forward  reference  */ 
then  { 

if  LON'S  (x,p)  /"conpare  based  on  PR  values  */ 
then  { PM  4-  PH  ♦ 1;  aark  x;P  as  long  | 
else  if  SHORT  («, p) 

then  iark  *:jd  as  short 

else 

NH  4-  NH  ♦ 1 ; 

P[X]  4-  PH;  N[  K ] 4-  NH) 
else  /*  backward  reference  */ 

{ PfT  c ] 4-  PH;  N[«]  4-  NH; 

if  LONS (c, p)  /"based  on  new  values*/ 

than  { PH  4-  PM  ♦ 1;  iark  *:(3  as  long  } 
else  if  SHORT(«,p) 

then  nark  x : 3 as  short 
else 

(integer  T initially  0; 

for  each  umarked  y:S  oa  BL  such  that  Pf  Y ] > P'3] 
{ T 4-  T ♦ 1;  enter  x on  L[K]|  ; 

C[«]  4-  ?;  D[c]  4-  P(«]  - PC  3 ]; 

NH  4-  NH  ♦ 1) 
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proceiure  EX IT-LEFT 

( e:p  <-  next  instruction  fro*  BL; 
if  s:p  is  unsarked  and  P[c]  < P[  0 ] 
t hen 

if  LONG(*,p> 

then  LONG -DEL  1ST («) 
else  if  SHORT(s,0) 

then  SHORT-DELIST  («) 
else 

f integer  TN,  TP  initially  0; 
for  each  K:S  on  BL  such  that  P[jr]  < P'  p ] 
if  Yz 8 is  narked  short 
then  TP  <-  TP  ♦ 1 
else  if  K:S  is  sacked  long 
then  TP  <-  TP  ♦ 2 
else 

( TP  <-  TP  ♦ 1;  TS  «-  TN  ♦ 1)  ; 
if  TP  ♦ TN  < K 

then  SHOHT-DELIST(S) 
else 

for  each  y:S  on  BL  sach  that  P[XJ  < P[0] 
if  /:&  is  unsarked 

then  enter  < on  L[/]; 

D[«]  <-  TP;  C[c]  «-  TN  | 


C6PY  AVAILABLE  TO  ODC  MB  WJ 
PERMIT  FULLY  LtCiSLE  PRQMCTWi 
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/*  =====  mainline  program  = = = = */ 

/*  the  aain  program  is  essentially  cycling  thraught  the 
three  "coroutines"  ENTER*RIGNT,  MIDDLE,  and  EXIT*LEFr. 
There  is,  however,  some  initial  and  final  control  sinre 
ENTER*RIGHT  eust  be  executed  4 tines  before  the  right  buffer 
is  filled  and  MIDDLE  begins  executing,  etc.  */ 

/•  initial  filling  of  buffers  */ 
for  N <-  1 to  / 
do  ENTER*RIGHT; 
for  H «-  1 to  / 

do  ( ENTER*PIGHT;  MIDDLE  | ; 

/*  process  further  input  */ 

While  INPOr»REHMHIMG 

do  ( ENTER* R IGH T;  MIDDLE;  EXIT*LEPT  | ; 

/•  final  flushing  of  buffers  */ 
far  M-  1 ta  / 

do  ( MIDDLE;  EX IT*L EFT  }; 
for  N «-  1 to  H 
do  EXIT-LEFT. 
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The  above  algorithm,  as  specified,  provides  a method  of  deciding  whether 
each  instruction  is  short  or  long  format,  but  does  not  apply  this  information 
to  the  actual  generation  of  instructions . A particular  problem  is  the  fact 
that  the  information  about  instruction  length,  in  either  version  of  the 
algorithm,  is  not  produced  in  the  order  of  the  program  text.  Ihis  would  not 
be  particularly  mportant  if  the  label  a of  instruction  a:g  was  actually 
a user-defined  label  and  hence  resided  in  the  symbol  table.  But  a is  likely 
to  be  generated  by  the  assembler  — for  example  the  statement  nunber  - -and 
there  would  be  far  too  many  generated  labels  to  fit  in  a reasonably  sized 
synbol- table. 

One  of  the  major  advantages  of  the  above  algorithm,  however,  is  that  the 
format  of  "most"  instructicns  is  determined  before  they  leave  the  left  buffer. 
A final  tentative  location  counter  (PL  is  the  above  notation  although  not 
actually  part  of  the  algorithm)  would  be  kept  by  EXIT- LEFT  and  assigned  to 
P[a]  as  a:&  leaves  the  buffer  if  a is  a user  defined  symbol.  Still  assum- 
ing unmarked  instructions  are  short,  it  is  however  necessary  to  adjust  for 
instructions  discovered  to  be  long  by  LONG-DELIST  after  they  have  left  LB. 

In  particular,  a file  FIXUP  should  be  created  to  hold  adjustment  infor- 
mation and  the  statement 

"mark  y as  long" 
in  LONG- DELIST  should  be  augmented  to 

"if  Y is  in  BL  then  mark  y as  long 
B . else  output  y on  file  FIXUP". 

At  the  end  of  the  first  assembly  pass  the  file  FIXUP  is  then  sorted  in 
statement  nuiber.  A linear  pass  thru  this  sorted  file  will  then  determine 
an  adjustment  factor  which  is  added  to  the  location  of  user-defined  synbol. 


-41- 


I 


This  adjustment  can  be  thought  of  as  the  operation 

"if  the  statement  nurber  of  user  defined  label  y 
is  in  the  range  to  S then  the  actual 

position  of  y is  P[y]  + a^  " . 

It  is,  of  course,  efficiently  inplemented  as  a binary  search.  "Origin" 
psuedo  operations  ihich  reset  the  assemblers  location  counter  rmst  also 
be  included  in  FIXUP  if  they  are  more  complicated  than  setting  the  location 
cointer  to  a constant  or  to  the  old  value  of  the  location  counter  plus/minus 
a consort.  Few  uses  of  "origin"  pseudo-operations  would  violate  these  con- 
ditions. They  must  in  any  case  be  restricted  to  expressions  containing  user 
defined  syrbol(s)  which  have  been  previously  defined  in  the  program  text, 
for  otherwise  extra  pass(es)  would  be  required.  The  user  defined  variables 
will  thus  have  adjustment  already  associated  with  them,  which  can  be  used 
to  compute  the  adjustment  corresponding  to  the  "origin." 
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